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(54) Video image compression using weighted wavelet hierarchical vector quantization 

(57) A weighted wavelet hierarchical vector quanti- 
zation (WWHVQ) procedure is initialed by obtaining (12) 
an N x N pixel image where 8 bits per pixel. A look-up 
operation .(14) is performed to obtain data representing 
a discrete wavelet transform (DWT) lollowed by a quan- 
tization of the data. Upon completion of the look-up, a 
data compression will have been performed. Further 
stages and look-up will result in further compression of 
the data, i.e., 4:1, B:1, 16:1. 32:1 : 64-1. ..etc. Accordingly, 
a determination (16) is made whether the compression 
is complete. If the compression is incomplete, further 
look-up is performed. If the compression is complete, 
however, the compressed data is transmitted (18). Op- 
tionally, it is determined (1 9) at a galeway whether lurther 
compression is required. II so, transcoding is performed 
(20). The receiver receives (22) the compressed data. 
Subsequently, a second look-up operation (24) is per- 
formed to oblain data representing an inverse discrete 
wavelet iransform ol the decompressed data. Afier one 
iteration.the data is decompressed by a lactor of two. 
Further iterations allows for lurther decompression of the 
data. Accordingly, a determination (26) is made whether 
decompression is complete. If the decompression is in 
incomplete, further look-ups are performed, elsethe pro- 
cedure is ended. 
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Description 

This-invention- relates lo s method and apparatus lor compressing a video image for transmission to a receiver 
and/or decompressing the image at the receiver. More particularly, the invention is directed lo an apparatus and method 
lor performing data compression on a video image using weighted wavelel hierarchical vector quantization (WWHVQ). 
WWHVO advantageously utilizes certain aspects ol hierarchical vector quantization (HVO) and discrete wavelet trans- 
lorm (DWT), b subband transform. 

A vector quantizer (VO) is a quantizer that maps k-dimensionel inpul vectors into one of a finite set ol k-dimensional 
reproduction vectors, or codewords. An analog-lo-digilal converter, or scalar quantizer, is a special case in which the 
quantizer maps each real number to one of a finile set of output levels. Since the logarithm (base 2) of the number of 
codewords is the number of bits needed to specify the codeword, the logarithm ol the number ol codewords, divided by 
ihe vector dimension, is the rale of the quantizer in bits per symbol. 

A VO can be divided into two pans: an encoder and a decoder. The encoder maps the input vector into a binary 
code representing Ihe index ol the selected reproduction vector, and the decoder maps the binary code in1o the selected 
reproduction vector. 

A major advantage ot ordinary VO over other types o1 quantizers (e.g., translorm coders) is that the decoding can 
be done by a simple table lookup. A major disadvantage ol ordinary VO with respect to other types of quant izers is that 
Ihe encoding is computationally very complex. An optimal encoder performs a lull search through the entire set of re- 
production vectors looking lor ihe reproduction vector thai is closest (with respect to a given distortion measure) to each 
inpul vector. 

For example, il the distortion measure is squared error, then Ihe encoder computes the quantity (|x - y|| s lor each 
input vector X and reproduction vector y. This results in essentially M multiply/add operations per inpul symbol, where 
M is the number of codewords. A number ol suboplimal, but computationally simpler, vector quantizer encoders have 
been studied in the literature. For a survey, see the book by Gersho and Gray, Vector Quantization and Signal Com- 
pressionrKluwer,- 1992. 

Hierarchical vector quantization (HVO) is VO that can encode using essentialry one table lookup per input symbol. 
(Decoding is also done by table lookup). To the knowledge of the inventors, HVO has heretofore not appeared in the 
literature outside of Chapter 3 of the Ph.D. thesis of P. Chang, Predictive. Hierarchical, and Translorm Vector Quan- 
tization lor Speech Coding , Stanford University, May 1986, where it was used tor speech. Other methods named 'hi- 
erarchical vector quantization' have appeared in the literature, but they are unrelated to the HVQ that is considered 
respecting the present invention. 

The basic idea behind HVO is the lollowing. The inpul symbols are linely quantized to p bits of precision. For image 
data, p = 8 is typical. In principle il is possible lo encode a k-dimensional vector using a single lookup into a table with 
a kp-bit address, but such a table would have entries, which is clearly infeasible if k and p are even moderately large. 
HVO performs the table lookups hierarchically. For example, to encode a k = 6 dimensional vector (whose components 
are each finely quantized to p = 8 bits of precision) to 8 bits representing one ol M = 256 possible reproductions, the 
hierarchical structure shown in FIGURE 1a can be used, in which Tables 1, 2, and 3 each have 16-bit inputs ande-bit 
outputs (i.e., they are each 64 KByte tables). 

A signal flow diagram tor such an encoder is shown in FIGURE 1b. In the HVO of FIGURE lb, the tables T at each 
stage of the encoder along with the delays 2 are illustrated. Each level in the hierarchy doubles the vector dimension 
ol Ihe quantizer, and thereloie reduces the bit rate by s lactor ol 2. By similar reasoning, Ihe ilh level in Ihe hierarchy 
perlorms one lookup per 2' samples, and Iherelore Ihe total number ol lookups per sample is al mosl 1/2 -r 1/4 + 1/8 
+ ... = 1 . regardless ol Ihe number ol levels. Of course, il is possible to vary these calculations by adjusting Ihe dimensions 
ol the various tables. 

The contents of Ihe HVQ tables can be determined in a variety of ways. A straightlorward way is the following. With 
reference to FIGURE la. Table 1 is simply a lable-lookup version of an optimal 2-dimensional VQ. That is. an optimal 
2-dimensional lull search VO with M = 256 codewords is designed by standard means (e.g., the generalized Lloyd 
algorithm discussed by Gersho and Gray), and Table 1 is filled so that it assigns to each of its 2 16 possible 2-dimensional 
input vectors the 8-bit index of the nearest codeword. 

Table 2 is jusl slightly more complicated. First, an oplimal 4-dimensional full search VQ with M = 256 codewords is 
designed by standard means. Then Table 2 is filled so that il assigns to each of its 2 16 possible 4-dimensional input 
vectors (i.e.. the cross product ol all possible 2-dimensional output vectors from the first stage) the 9-bit index of its 
nearest codeword. The tables lor siages 3 and up are designed similarly. Noie that the distortion measure is completely 
arbitrary. 

Adiscrete wavelel iranslormation (DWT), or more generally, a tree-slruclured subband decomposition, is a method 
lor hierarchical signal transformation. Little or no inlormalion is tost in such a transformation. Each stage ol a DWT 
involves filtering a signal into a low-pass component and a high-pass component, each of which is critically sampled 
(i.e., down sampled by a factor of two). A more general tree-structured subband decomposition may filler a signal into 
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more than two bands per stage, and may or may not be critically sampled. Here we consider only the DWT, but ihose 
skilled in the an can easily extend the relevant notions to the more general case. 

With reference to FIGURE 2a. let X = (>(0), x(l),. ,x(N - 1)) be a 1 -dimensional input signal with linite length N. As 
shown by the tree structure A, the lirst stage ol a DWT decomposes the input signal \ D = X into the low-pass and 

s high-pass signals X L1 = (X L1 (0), x L1 (l)....x L1 (N/2-i)) and X H1 = (x H1 (0), X^O) x H1 (N/2-1)), each ol length N/2. The 

second stage decomposes only ihe low-pass signal X L1 from the lirst stage into the low-pass and high-pass signals X,^ 

= (x^O), XljII) x L j(N/4-l)) and = (x^O), x H2 (1) x H2 (N/4-i)), each ol length N/4. Similarly, the third stage 

decomposes only the low-pass signal X^ Irom the second stage into low-pass and high-pass signals and Xh 3 ol 
lenglhs N/8, and so on. It is also possible lor successive stages to decompose some ol the high-pass signals in addition 

io io the low-pass signals. The set ot signals at the leaves ot the resulting complete or partial tree is precisely the transform 
ot the input signal at the root. Thus s DWT can be regarded as a hierarchically nested set ol transforms. 

To specify the transform precisely, it is necessary to specify Ihe filters used at each stage. We consider only linite 
impulse response (FIB) filters, i.e., wavelets with finiie support. L is Ihe length of the filters (i.e., number of taps), and 
the low-pass filler (the scaling lunclion) and the high-pass fitter (the difference function, or wavelet) are designated by 

>s their impulse responses, 1(0), 1(1) I(L - l), and h(0), h(l),...h(L-1), respectively. Then at the output of the mth stage, 

x L.m<0 = 'I'oJx^CSQ * l(1)x Lm .,(2i ♦ 1) * ... * l(L-1)x Lnvl (2i+ L-1) 

X M.m(') = h <°) X H.n>.l< 2 ') + h 0 > X H. m -1 < 2 ' + 1 > + - + h ( L - 1 ) X H.m-1 < 2 ' + L - 1 ) 

20 tor i = 0, 1 N/2™. Boundary effects are handled in some expedient way. such as setting signals to zero outside their 

windows of definition. The filters may be the same Irom node to node, or they may be different. 

The inverse transform is performed by different towpass and high-pass filters, called reconstruction filters, applied 

in reverse order. Let l'(0), l'(1) I'(L-I) and h'(0), h'(1), ..h'(L-l) be the impulse responses of the inverse filters. Then 

X L.m-i can be reconstructed from \ m and X„ m as: 

25 ™= '"PJW) + V &\.J X * + h 'l°>W) + h "( 2 ) X H.m( i + 1 ) 

W f 2i + 1 > = '"WW + V * ''(^W' + 2 > + h '' 1 )XM.m' U 1 ) + h '( 3 ) X M. m ( i+ 2) 

lor i-0,1 N/2 ra . That is, the low-pass and high-pass bands are up sampled (interpolated) by a laclorol Iwo, liltered by 

30 iheir respective reconstruction litters, and added. 

Two-dimensional signals are handled similarly, but with two-dimensional fillers. Indeed, if the fillers are separable, 
then the filtering can be accomplished by lirst filtering in one dimension (say horizontally along rows), then filtering in 
Ihe other dimension (vertically along columns). This resulls in the hierarchical decompositions illustrated in FIGURES 
2B, showing tree structure B, and 2C, in which the odd stages operate on rows, while Ihe even stages operate on 
35 columns. If the input signal X^q is an N x N image, then X L , and X H1 are N x (N/2) images, X LL2 . X^, X^, and X HH2 
are (N/2) x (N/2) images, and so forth. 

Moreover, notwithstanding thai which is known about HVO and DWT, a wide variety of video image compression 
methods and apparatuses have been implemented. One existing method thai addresses transcoding problems is the 
algorithm of J. Shapiro, "Embedded Image Coding using Zerolrees ol Wavelet Coefficients," IEEE, Transactions on 
•40 Signal Processing, December 1993, in which transcoding can be done simply by stripping ofl prefixes ol codes in the 
bil stream. However, this algorilhm irades simple transcoding lor computationally complex encoding and decoding. 

Other known methods lack certain praclical and convenient lealures. For example, tnese other known video com- 
pression methods do not allow a user io access ihe transmitted image al different quality levels or resolutions during 
an interaciive multicast over mufiiple rale channels in a simplified system wherein encoding and decoding are accom- 
<5 plished solely by Ihe performance of table lookups. 

More particularly, using these other non-embedded encoding video compression algorithms, when a multicast (or 
simulcasl, ss applied in the lelevision indusiry) of s video stream is accomplished over a nelwork. either every receiver 
of the video stream is reslricled to a certain quality (and hence bandwidth) level at the sender or bandwidth (and CPU 
cycles or compression hardware)is unnecessarily used by multicasting a number of streams at different bil rales, 
so In video conferencing (multicast) over e heterogeneous network comprising, tor example, ATM, the Internet. ISDN 

and wireless, some lorm ol transcoding is typically accomplished al Ihe "gateway" between sender and receiver when 
a basic rale mismatch exists between Ihem. One solution lo Ihe problem is lor Ihe "galewayVreceiver Io decompress 
the video siream and recompress and scale it according lo inlernal capabilities. This solulion. however, is not only 
expensive bul also increases lalency by a considerable amounl. The transcoding is preferably done in an online lashion 
55 (with minimal latency/buffering) due lo the interactive naiure ot the application and to reduce hardware/software costs. 

From a user's perspective, the problem is as follows: (a) Sender(i) wants to send a video stream at K bits/sec lo M 
receivers; (b) Receiver(j) wants ic receive Sender(i)'s video siream al L bits/sec (L<K); bul (c) the image dimensions 
Iha! Receiver(j) desires or is capable of processing are smaller than Ihe default dimensions that Sender(i) encoded. 
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li is desirable thai any system and/or method to address these problems in interactive video advantageously incor- 
porate (1) inexpensive transcoding Irom a higher 10 a lower bit rate, preferably by only operating on a compressed 
stream, (2) simple bit rale control. (3) simple scalability of dimension al the destination, (4) symmetry (resulting in very 
inexpensive decode and encode), and (5) a prioritized compressed stream in addition to acceptable rate-distortion per- 
* lormance. None ol the current standards (Motion JPEG, MPEG and H.261) possess all ol these characteristics. In 
particular, no current siendard has a facility to transcode Irom a higher to a lower bit rate efficiently. In addition, all are 
computationally expensive. 

The present invention seeks to overcome the alorenoted and other problems and to incorporate the desired char- 
acteristics noted above. It is particularly directed to the arl ol video dala compression, and will thus be described with 
io specific reference thereto. It is appreciated, however, that the invention will have utility in other fields and applications. 
The present invention provides a method for compressing and transmitting data, the method comprising steps of: 
receiving ihe data; successively performing multiple stages of first lookup operations to obtain compressed data at each 
stage representing vector quantized discrete subband coefficients; and, transmitting the compressed data lo a receiver. 
The method may comprise successively performing i levels of a lirst table lookup operation to obtain, for example, 
'5 2':1 compressed dala representing a subband transform, e.g., DWT, of the input data followed by vector quantization 
thereof. 

In accordance=with another aspect of the present invention, the compressed data (which may also be transcoded 
at a gateway) is transmitted to a receiver. 

In accordance with another aspect ol Ihe present invention, the compressed data is received al a receiver and 
20 multiple slages of a second lable lookup operation are performed lo selectively obtain decompressed data representing 
al least a partial inverse subband translorm ol the compressed data. 

The invention further provides an apparatus for carrying out the methods as set torthe above or in accordance with 
any of the embodiments described herein. 

One advantage of the preseni invention is that encoding and decoding are accomplished solely by table lookups. 
?s This results in very efficient implementalions. For example, this algorithm enables 30 frames/sec encoding (or decoding) 
of CIF (320x240) resolution video on Sparc 2 class machines with just 50% CPU loading. 

Another advantage of the present invention is that, since only table lookups are utilized, the hardware implemented 
to perlorm the method is relatively simple. An address generator and a limited number of memory chips accomplish the 
method. The address generator could be a microsequencer, a gate array, an FPGA or a simple ASIC, 
so The present invention exists in Ihe construction, arrangement, and combination, of the various parts of the device, 

whereby the objects contemplaled are attained as hereinafter more lully sel lorth, specifically pointed out in Ihe claims, 
end illustrated by way ol exemplary embodiments in the accompanying drawings in which; 

FIGURE la illustrates.a.table.struclure ot prior an HVQ: 

FIGURE lb is a signal flow diagram illustrating prior art HVQ for speech coding; 

FIGURES 2a-c are a graphical representation of a prior an DWT: 

*o FIGURE 3 is a flowchart representing the prelerred method of the preseni invention; 

FIGURES 4a-b are graphical representations ol a single stage of an encoder performing a WWHVQ in the method 
of FIGURE 3; 

•« FIGURE Ac is a signal How diagram illustrating an encoder performing a WWHVQ in the method ol FIGURE 3; 

FIGURE 5 is a signal flow diagram of a decoder used in Ihe method ol FIGURE 3; 

FIGURE 6 is a block diagram of a sysiem using 3-D subband coding in connection with Ihe method of FIGURE 3; 

so 

FIGURE 7 is a block diagram of a system using Irame differencing in connection with the method of FIGURE 3; 

FIGURE 5 is a schemalic representation of Ihe hardware implementation of ihe encoder of the method of FIGURE 3: 

55 FIGURE 9 is a schemalic represenlalbnol another hardware implementation ol Ihe encoder of Ihe method FIGURE 

3; 

FIGURE 10 is a schemalic repicscniation ol the hardware implementation of the decoder of the mclhod of FIGURE 
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FIGURE 11 is e schemalic representation of snoiher hardware implementation ol the decoder ol the method ot 
FIGURE 3. 

Referring now to the drawings wherein the showings are 1or the purposes o1 illustrating the preferred embodiments 
ol the invention only and not tor purposes of limning same, FIGURE 3 provides a flowchart of the overall preferred 
embodiment. II is recognized thai Ihe method is suitably implemented in the structure disclosed in the lollowing preferred 
embodiment and operates in conjunction wilh soliware based control procedures. However, it is contemplated that the 
io control procedure be embodied in other suitable mediums. 

As shown, Ihe WWHVO procedure is initiated by obtaining, or receiving in the system implementing Ihe method, 
input data representing an NxN pixel image where S bits represent a pixel (steps 10 and 12). A look-up operation is 
performed to obtain data representing a subband transform followed by a vector quantization ol the data (step 14). In 
the preferred embodiment, a discrete wavelet transform comprises the subband translorm. However, it is recognized 
•s that other subband transforms will suffice Upon completion ot the look-up, a dala compression has been performed. 
Preferably, such compression is 2:1. Further stages will result in further compression of the data, e.g., 4:1, 8:1, 16:1. 
32:1 , 64:1 .... etc. It is appreciated that other compression ratios are possible and may be desired in certain applications. 
Successive compression stages, or iterations of step 14, compose the hierarchy ol Ihe WWHVQ. Accordingly, a deter- 
mination is made whether Ihe compression is complele (step 16). II -the compression is incomplele, lurther look-up is 

20 performed. II the desired compression is achieved, however. Ihe compressed data is transmitted using any known trans- 
miller (step 18). II is then determined al, lor example, a network gateway, whether lurther compression is required (step 
1 9). If so, transcoding is performed in an identical manner as Ihe encoding (step 20). In any event, the receiver eventually 
receives the compressed data using known receiving techniques and hardware (step 22). Subsequently, a second 
look-up operation is perlormed to obtain data representing an inverse subband transform, preferably an inverse DWT. 

?s ol decompressed data (step 24). After one complete stage, Ihe data is decompressed. Further stages allow lor further 
decompression ol the data to a desired level. A determination is then made whether decompression is complete (step 
26). If the decompression is incomplete, further look-ups are perlormed. If, however, the decompression is complele. 
the WWHVQ procedure is ended (step 28). 

The embodiment ol FIGURE 3 uses 8 hierarchical coding approach, advantageously incorporating particular lea- 

30 lures of hierarchical vector quantization (HVQ) and the discrete wavelet translorm (DWT). HVO is extremely lasl, though 
its performance directly on images is mediocre. On Ihe other hand, Ihe DWT is computationally demanding, though il 
is known to greatly reduce blocking artilacls in coded images. The DWT coellicients can also be weighted 16 match Ihe 
human visual sensitivity in dillerenl Irequency bands. This results in even belter performance, since giving higher weights 
to the more visually important bands ensuies that they will be quantized to a higher precision. The present invention 

3S combines HVO and the DWT in a novel manner, to obtain the best qualities of each in a single system, Weighted Wavelet 
HVO (WWHVO). 

The basic idea behind WWHVO is lo perlorm the DWT filtering using table lookups. Assume Ihe input symbols have 
already been finely quantized to p bits ot precision. For monochrome image data, p = 8 is typical. (For color image data, 
each color plane can be treated separately, or they can be vector quantized together into p bits). In principle it is possible 

-o io filter Ihe data with an L-tap litter with one lookup per output symbol using e table with a Lp bit address space. Indeed, 
in principle il is possible lo perform both the low-psss lillering and the high-pass tillering simultaneously, by sloring both 
lowpass and high-pass results in Ihe table. Ol course, such a table is clearly inleasible if L and p are even moderately 
large. We lake an approach similai lo thai ol HVO: lor a filler ol length L, log? L "substages" ol lable lookup can be used, 
il each lable has 2p input bits and p output bils. The p outpul bits of Ihe linal subslage represenls a 2-dimensional vector: 

~s one symboltrom the low-pass band and a corresponding symbol Iran the high-pass band. Thus, the wavelet coetticients 
outpul from the table are vector quanlized. In this sense, the DWT is tightly integrated with the HVQ. The wavelet 
coefficients at each stage of the DWT are vector quantized, as are the intermediate results (alter each substage) in the 
computation ol ihe wavelet coefficients by table lookup. 

FIGURE 4a shows one stage i ol the WWHVO, organized as log 2 L substages of table lookup. Here, L = 4, so that 

so ihe number ol substages is two. Note that the filters slide over by two inputs to compute each output. This corresponds 
lo downsampling (decimation) by a (actor of 2. and hence a reduction in bit rate by factor ol 2. The second siage ol the 
WWHVO operates on coded outputs Irom the first stage, again using log ? L substages of table lookup (but wilh different 
tables lhan in the first stage) and so on for the following stages. Il is recognized that each of a desired number of stages 
operaies in subslanlially Ihe same way so that the p bils al Ihe output ol stage i represent a 2': 1 compression. The p 

a bils al Ihe oulpul ol Ihe linal stage can be transmitted directly (or indirectly via e variable-length code). The Iransmilted 
data can be lurther compressed, or "transcoded". lor example, al a gateway between high and low capacity networks, 
simply by lurther stages ol table lookup, each ol which reduces the bit rate by a factor of 2. Hence both encoding and 
transcoding are extremely simple. 
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Theiables in the first stage (i= 1) may be designed as lollows. with (eleience to FIGURE 4a. In this discussion, we 
shall assume that the input signal X = (x(0).x(1),...,x(N-1)) is one-dimensional. Those skilled in the an will have little 
difficulty generalizing the discussion to image data. First decompose the input signal X L0 = X into low-pass and high-pass 
signals Y^, and X H1 , each of length N/2. This produces e sequence of 2-dimensional vectors 
S x(i) = |x L1 (i), x H) (i).tori = 0, 1 N/2-1. 

x u (i) = l(0)x(2i) + l(l)x(2i ♦ 1) + l(2)x(2i + 2) 4 l(3)x(2i + 3} 

to and 

x H1 = h(0)x(2i) ♦ h(l)x(2i * 1) ♦ h(2)x(2i 4 2) ■» h(3)x(2i+ 3) 

Such 2-dimensional vectors ere used tonain an S-bit vector quantizer O, 2 tor Table 1 .2. Likewise, 2-dimensional vectors 
|l(0)x(2i) + l(1)x(2i + 1). h(0)x(2i) + h(1)x(2i + 1)], 
is are used to train an 6-bit vector quantizer O, ,„ for Table i.la, and 2-dimensional vectors 
|l(2)x(2i 4 2) 4' l(3)x(2i * 3). h(2)x(2i + 2) + h(3)x(2i + 3)], 
are used to train an 6-bit vector quantizer 0 11t lor Table 1.1b. All three quanlizets are trained to minimize the expected 
weighted squared error distortion measure d(x,y) r {v/^Xq - y 0 )J 2 + [w H1 (x, - y,)] 2 , where the constants w L1 and w H , 
are proportional to the human perceptual sensitivity (i.e.. inversely proportional to the just noticeable contrast) in the 
20 low-pass and high-pass bands, respectively. Then table 1.1a is filled so that it assigns to each of its 2 16 possible 2-di- 
mensional input vectors [x 0 ,x,]the 8-bit index ol the codeword Iy L1 . 1a .yHi.i J 01 Q i.ia nearest to [KO)^ 4 l(1)x 1 .h(0)x o 
+ h(1)x,] in the weighted squared error sense. Table 1.1b is filled so that it assigns 1o each of its 2 16 possible 2-dimen- 
sional input vectors |x 2 ,x 3 ] the 8-bit index of the codeword [y ui1b , y H i.itJ °* Q i.ib nearest to [l(2)x 2 + l(3)x 3 . h(2)x 2 + h 
(3)x 3 ] in the weighled squared error sense. And finally Table 1.2 is filled so that it assigns to each of its 2 16 possible 
25 4-dimensional inpul vectors (i.e, the cross product of all possible 2-dimensional output vectors from the first stage), for 
example, |y L1 ,,„. y H1 .i a - v li.ii>. ym.ibl the ebil index of the codeword [y L1 , y H1 ] of O,^ nearest to [y L11a + y L1 lb , y H ,., a 
+ y H1 , b ] in the weighted squared error sense. 

For a small cost in performance, it is possible to design the tables so that Table 1.1 a and Table 1.1b are the same, 
lor instance, Table 1.1, if i= 1 in FIGURE 4b. In this case, Table 1.1 is simply a table lookup version ol a 2-dimensional 
■30 VO that best represents pairs ol inputs |x e , x,) in the ordinary (unweighted) squared error sense. Then Table 1 .2 is filled 
so that h assigns to each of its 2 16 possible 4-dimensional input vectors (i.e., the cross product ol all possible 2-dimen- 
sional output vectors from the first stage), for example, |y 0 , y v y 2 , y 3 ], the 8-bit index of the codeword [y L1 , y H1 ] ol Q., 2 
nearest tojl(0)y o 4 l(1)y, * l(2)y 2 4 l(3)y 3 ,h(0)y 0 = h(1)y, 4 h(2)y 2 4 h(3)y 3 ]in the weighted squared error sense. Making 
Tables 1.1a and 1.1 bine same would result in a savings ol both table memory and compulation, as shown in FIGURE 
3S 4b. The corresponding signal flow diagram is shown in FIGURE 4c. 

Referring again to FIGURE 4a, the tables in the second stage (i - 2) are just slightly more complicated. Decompose 
the input signal X u , of length N/2, into low-pass and high-pass signals X^ and X H2 , each of lenglh N/4. This produces 
e seouence of 4-dimensional vectors. x(i) = |x L2 (i),x H£ .(i).x H1 (2i).x H ,(2i 4 1)). lor i = 0,1,...,N/4 - 1, where x L2 (i)= l(0)x L1 
(2i) +'l(l)x L ,(2i 4 1) 4 l(2)x tl (2i 4 2) 4 l(3)x l ,(2i 4 3) and x H2 (i) * h(0)x L ,(2i) 4 h(l)x ll (2i 4 1) 4 h(2)x L ,(2i + 2) 4 h(3)x L1 
*o (2i 4 3). Such 4-dimensionel vectors are used lo train an 6-bit vector quanlizer Q 2 2 lor Table 2.2. Likewise, 4-dimensional 
vectors |l(0)x L ,(2i) 4 l(l)x L1 (2i 4 1),h(0)x L1 (2i) 4 h(l)x L1 (2i 4 1),x H ,(2i),x m (2i 4 1)]. are used lo train an 8-bil vector 
quanlizer Q 2 , fl lor Table 2.1a, and 2-dimensional vectors |l(2)x L) (2i 4 2) 4 l(3)x L1 (2i 4 3), h(2)x L1 (2i 4 2) 4 h(3)x L ,(2i 4 
3). > H1 (2i 4 2). x H1 (2i 4 3)]. are used to irain an S-bn vector quanlizer Q s 1b for Table 2.1b. All three quantizers are trained 
lo minimize the expected weighted squared error distortion measure d(x,y) = (w^Xo-y^] 2 4 [w M2 (x,- y,)] 2 4 |w m (x- - 
45 y?)) 2 + I w hi ( x 3 " y3)J 2 . where the constants w,^. w H2 and w M1 are proponional to the human perceptual sensilivhies in 
their respective bands. Then Table 2. la is filled so that it assigns to each of its 2 16 possible 4-dimensional input vectors 
|y 0i y v y 2 , y 3 ] the 6-bil index ol Ihe codeword of 0 2 1a nearest 10 (l(0)y 0 4 l(l)y,,h(0)y 0 4 hp)y.,,y 2 ,y3] in the weighled 
squared error sense, and so on tor Tables 2.1b and 2.2 in the second stage, and the tables in any succeeding stages. 
As in HVQ, in WWHVO the vector dimension doubles with each succeeding stage. The formats ol the vectors at 
so each stage are shown as outpuls of the encoder 30, graphically represented in FIGURE 4c. These lormats are lor the 
case of the twodimensional separable DWT shown in FIGURE 2b. 

Releriing now lo the case where all tables in a given substage are identical as in FIGURE 4b, and 4c, if the number 
ol inpuls lo a stage is S bytes, then ihe number ol outputs is: 
S/2 

a and the number of table lookups per output is log L. it the compression ratio is C, then the lotal number of outputs 
(including ihe outputs of intermediate stages) is 

N 2 (C-1)/C. 
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tor an image ot size N z . Thus, the total number ot table lookups is: 
(N : '(C-l)logLVC 

If the amount ol storage needed lor the H VO encoder is T, then the amoum ol slorage needed tor the WWHVO encoder 
£ is T log L per stage. 

Alsoshown in FIGURE Ac are the respective delays 2 which successively increase with each stage. The oval symbol 
including the I 2 designation indicates that only one ol every Iwo outpuls is selected, as those skilled in the art will 
appreciate. 

The WWHVO decoder 60 which performs steps 22-26 ot FIGURE 3 is shown in FIGURE 5. All tables ol decoder 
io 50 in a given substage are idenlical, similar tc the encoder ol FIGURES 4b and 4c. As those skilled in the art will appre- 
ciate, the oddeven splil tables 52, 54 at each stage handle the interpolation by 2 that is part ol the DWT reconstruction. 
II L = 4 and the filter coefficients are h(i) and l(i).i = 0, 1,2,3, then the odd table 52 computes h(1)x(i) + h(3)x(i + 1) and 
l(l)x(i) -i (i 4 l), while the even table 54 computes h(0)x(i) + h(2)x(i + l) and l(0)>(i) -t l(2)x(i + 1), where x(i)'s are the 
inputs to thai stage. If the number of inputs to a siage is S , then the number of outputs Irom that stage is 2S. The total 
if number of table lookups lor ihe stage is: 

S log (L/2) 

If the compression ratio is C, then the total number of outputs (including outputs ol intermediate stages) is: 
2N Z (C-1)/C 

20 lor an image of size N 2 . Thus, the total number of table lookups is: 

(N 2 (C-1)/C) log (L/2) 

If the amount of storage needed for Ihe HVO encoder is T, then the amount ol storage needed lor the WWHVO decoder 
per stage is: 

ss T(log(Lr2)+l)=TlogL 

All the storage requirements presented are lor 8-bils per pixel input. For color images (YUV, 4:2:2 lormat) the storage 
requirements double. Similarly the number ol table lookups also doubles lor color images. 

As shown in FIGURES 6 and 7, there are two options for handling motion and interframe coding using the preseffi 
method. In a lirst mode (FIGURE 6), Ihe subband coding is extended and lollowed by a vector quantization scheme that 
30 allows lor inlralrame coding performed as described in connection with encoder 30 to intertrame coding, designated by 
reference numeral 80. This is similar to 3-D subband coding. 

The second way (FIGURE 7) of handling motion is to use a simple trame differencing scheme that operates on the 
compressed data and uses an ordered codebook to decide whether to transmit a certain address. Specifically, the 
WWHVO encoder 30 shown in FIGURE 4c is used in conjunction with a frame ditlerencer 70. The Irame ditferencer 70 
as uses a current frame encoded by encoder 30 and a previous frame 72 as inputs to obtain a result. 
Some ol the features ot Ihe WWHVO of the present invention are: 

Transcoding 

to The sender transmits a video stream compressed al 16:1. The receiver requests the "gateway* for a 32:1 stream 

(or the galeway sees thai Ihe slower link can only handle 32: l). All the gateway has to do to achieve this transcoding 
is do a lurther level of table lookup using thedala il receives (al 16.1) as input. This is extremely useful, especially when 
a large range ot bandwidths have to be supported, as. for example, in a heterogenous networked environment. 

es Dimension Scaling 

It the video stream is compressed up to J stages, then the receiver has s choice of (J) + i image sizes without any 
extra etlort. For example, il a 256 x 256 image is compressed to 5 stages (i.e., 64:1), then the receiver can reconstruct 
a 32 x 32 or 64 x 64 or 126 x 126 or 256 x 256 image without any overhead. This is done by just using the low pass 
so bands LL available at the even numbered stages (see FIGURE 3). Also, since the whole method is based upon table 
look-ups, il is very easy to scale the image up. i.e.. the interpolation and lowpass filtering can be done ahead of time. 
Note also that all this is done on Ihe compressed bil-slream itsell. In standard video compression schemes both down 
and up scaling is achieved by explicitly lillenno and decimating or interpolating Ihe inpul or output image. 

ss Motion 

There are two simple options for handling motion Simple Iramc diflcrencing can be applied to the compressed data 
itself. Another option is to use a 3-D subband coding scheme in which the temporal dimension is handled using another 
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WWHVO in conjunction with ihe spaiial WWHVO. The wavelei used heie can be different (ii is ditlerent in practice). In 
£ preferred implementation, motion deiedion and thresholding are accomplished. This is done on the compressed data 
stream. The current compressed Irame and ihe previous compressed Irame are compared using e lable lookup. This 
generates a binary map which indicates places where motion has occurred. This map is used to decide which blocks 
* (codewords) lo send and a zero is inserted in place of stationary blocks. This whole process is done in an online manner 
with a run-length encoder. The table lhat is used tocreaie ihe binary map is constructed off-line and the motion threshold 
is set at that time. 



Typically, the decoder has to do color space conversion and possibly dithering (tor< 24-bit displays) on the decocted 
stream betore it can display M. This is quite an expensive task compared 10 the complexity ol the WWHVO decoder. But 
since the WWHVO decoder is also based upon table lookups, these steps are incorporated into the decoder output 
lables. Other than speeding up the decoder, Ihe other major advantage of this technique is lhat it makes the decoder 
ib completely independent ol the display (resolution, depth etc.). Thus, the same compressed stream can be decoded by 
ihe same decoder on two ditlerent displays by just using ditlerent output tables. 

The WWHVO method is very simple and inexpensive to implement in hardware. Basically, since the method pre- 
dominantly uses only table lookups, only memory and address generation logic is required. The architectures described 
below differ only in Ihe amount of memory they use versus the amount of logic. Since alternate stages operate on row 
«> and column data, there is a need for some amount ol bullering between stages. This is handled explicitly in the archi- 
tectures described below. All the storage requirements are given lor 8-bits per pixel input. For color images (YUV, 4:2:2 
lormat) the storage requirements double. But the timing information remains Ihe same, since the luminance and chromi- 
nance paths can be handled in parallel in hardware. Also, as noied above, two simple options are available tor inter) rame 
coding in WWHVO. The 3-D subband coding option is implemented as an additional WWHVO module, while the frame 
« differencing option is implemented as a simple comparator. 

Referring now to FIGURE 6, each one of the lables i.1 (or i.laandi.lb) and i.2of the present invention are mapped 
onto a memory chip (64KB in this case). The row-column allernalron between stages is handled by using a buffer 60 of 
NL bytes between stages, where N is the row dimension of the image and L is the length ol the wavelet filler. For example, 
between stages 1 and 2 this butter is written in row lormat (by stage 1 ) and read in column format by stage 2. Some 
30 simple address generation logic is required. Accordingly, address generator 90 is provided. The address generator 90 
is any suitable device such as an incrementer. an accumulator, oi an adderplus some combinational glue logic. However, 
Ihis archileclure is almost purely memory. The input image is led as Ihe address lo the lirst memory chip whose output 
is led as Ihe address lo Ihe next memory chip and so on. 

The total memory requirements tor the encoder 30 and Ihe decoder 50 are T log L ♦ NL(M - 1) bytes, where M is 
as the number o1 stages. For example, the WWHVO encoder and decoder shown in FIGURES 3 and 5 need 64KB for each 
lable i.1, i.2 plus Ihe buffer memory 60. If the number ol stages. M, is 6, Ihe wavelet filter size. L, is 4 and the image row 
dimension, N, is 256, Ihen the amount ol memory needed is 768KB -i 5 KB= 773 KB. The number of 64KB chips needed 
is 12 and the number of 1KB chips needed is 5. The throughput is obviously maximal, i.e., one output every block cycle. 
The laiency per stage is NL clocks, except for the lirst stage which has a latency of 1 clock cycle. Thus the latency alter 
« the m stages is <m-i)NL+1 clocks. 

The main advantage of this archileclure is lhal il requites almost no glue logic and is a simple (low through archi- 
leclure. The disadvantages-are thai the number ol chips required scales wilh Ihe number ol slages and board area 
required becomes quite large. The capacity ol each chip in Ihis archileclure is quite small and one can easily buy a 
cheap memory chip which has the capacily ol all these chips combined. This is considered in the architecture of FIGURE 

Referring now to FIGURE 9, il all the lables are loaded onto one memory chip 100. ihen an address generator 102 
and a Irame buffer 104 are used. In this archileclure one stage ot the WWHVO is compuied at a time. In fact, each 
sub-level of a stage is compuied one at a time. So in the encoder 30 the table lookups lor table i.1 are done first, Ihe 



=5 byles. The Irame buller 104 also permits simple row-column allernalion between slages. 

The address generator 102 has to generate two addresses, one tor the table memory chip 106 and another lor the 
Irame memory chip 104. The address tor Ihe Irame memory chip 104 is generaied by a simple incrementer, since the 
access is uniform. II the number of laps in the wavelcl filter is L, then ihe number of levels per stage (i.e., wavelet lables) 
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lor the encoder 30 is log L (ii is log L/2 for the decoder). Each ot the tables i.l and i.2 are of size tsize. The outpui of 
level q of stage m is computed, it the ouipui ol the Irame memory i 04 is y(i). then the address lhat the address generator 
10? computes lor the table memory 106 is y(i) + oflsel, where offset = ||m - i)log L + (q-1)]Msize. This is assuming the 
tables i.l, i.2-are stored in an ordered manner. The multiplicaiions involved in the computation of oflsel need not be 
s done since offset can be maintained as a running total. Each time one level of computation is completed offset = offset 
+ tsize. Thus, the address generator 102 is any suitable device such as an incrementer, an accumulator, or en adder 
plus some combinational glue logic. 

The total memory requirements for this architecture is: 

, 0 T log L -t N Z /2 

bytes spread oul over two chips, the frame memory 104 and the table memory 106. For the example considered in the 
previous section this translates to 600KB of memory (768KB t 32KB). The throughput is one output every clock cycle. 
The latency per stage is: 

„ (NV^bgL. 

where m is the siage number. The first stage has a latency of just log L. Thus, the latency after m stage is: 

N 2 (l-(l/2 ra ' 1 ))logL + 1 

The advantages ot this architecture are its scalability, simplicity and low chip count. By using a large memory chip 
20 100 (approximately 2MB or more), various configurations ot Ihe encoder and the decoder, i.e., various table sizes and 
precision, may be considered and, the number ol stages can be scaled up oi down without having to change anything 
on the board. The only disadvantage is latency, but in practice the latency is well below 50 milliseconds, which is the 
threshold above which humans start noticing delays. 

It is important to note the connection between the requirement of a half frame memory 104 and the latency. The 
« latency is there primarily due to the fact that all the outpuls of a stage are computed before the nexl stage computation 
begins. These intermediate outputs are stored in the Irame memory 104. The reason the latency was low in the previous 
architecture was that Ihe computation proceeded in a flow through manner, i.e., begin computing the outputs of stage 
m before all Ihe outputs ot slage m - 1 were computed. 

FIGURE 10 illustrates the architecture lor the decoder similar to Ihe encoder ol FIGURE B. As shown, an address 
30 generator 90' is connected to a butler 60' having inputs ol the odd and even tables 52 and 54. The outpul ol the bufter 
connects to the next slage and Ihe address generator also connects to the nexl butter. 

FIGURE n shows the architecture tor the decoder similar to the encoder ol FIGURE 9. As illustrated, the interframe 
coder 80' is simply placed at the input to the chip 1 00'. as opposed to the outpul. 

The memory chips utilized to lacililate the look-up tables of the present invention are preferably read only memories 
35 (ROMs). However, it is recognized thai olher suitable memory means such as PROM. EPROM, EEPROM, RAM, e1c. 
may be utilized. 



Claims 

40 

1. A method lor compressing and transmitting data, Ihe method comprising steps ol: 

receiving Ihe data; 

successively performing multiple stages ol tirsl lookup operanons to obtain compressed data at each stage 
represeniing vector quantized discrete subband coefficients; and, 
~s iransmitling the compressed data to a receiver. 

2. The method according to claim 1 further comprising: 

receiving the compressed data at Ihe receiver; 

successively performing multiple stages of second lookup operations to selectively obtain at each stage before 
so a last stage decompressed dale representing a partial inverse subband transform of the compressed data. 

3. The method of claim 1 or 2, wherein i lookup stages are perlormed and ihe compressed data is 2*:1 compressed 
data, where i is an integer. 

55 4. The method according to claim 1, 2 or 3. wherein the subband transioim coefficients comprise discrete wavelet 
translorm coefficients. 

5. The method of any of claims 1 to 4, further comprising transcoding ;hc compressed data alter the transmitting at a 
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gateway io the receiver to obtain further compressed dale. 

A method adaptable tor use on signal dala compressed by performing a subband transform followed by s vector 
quantization of the signal data, comprising steps of: 
receiving the compressed signal data; 

successively perlorming multiple stages of lookup operations to selectively obtain at each stage before a last 
stage panially decompressed data representing a panial subband translorm ol the compressed signal data. 

The apparatus lor compressing and transmitting data, comprising: 
means tor receiving the data; 

means for successively perlorming multiple stages ot first lookup operations to obtain compressed data at 
each stage representing vector quantized discrete subband coefficients: and, means transmitting the compressed 
data to a receiver. 

The apparatus according to claim 7, further comprising: 

means for receiving the compressed data a1 the receiver; and 

means for successively perlorming multiple stages of second lookup operations to selectively obtain at each 
stage belore a last stage decompressed data representing a partial inverse subband Iranslorm of the compressed 
dala. 

An apparatus adaptable for use on signal dala compressed by perlorming a subband translorm lollowed by a vector 
quantization ot the signal data, comprising. 

means for receiving the compressed signal data: 

means for successively performing multiple stages of lookup operations to selectively obtain at each stage 
before a last stage panially decompressed dala representing a panial subband transform of the compressed signal 



A programmable apparatus tor compressing and receiving data, when suitably pprogrammed for carrying out the 
method ol any of claims 1 to 6. 
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30-^ (| 2 
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